Performance Analysis of MapReduce Program in Heterogeneous Cloud Computing
نویسندگان
چکیده
The research of Hadoop is an important part of cloud computing industry, and Hadoop performance research is a key research direction. The Hadoop performance analysis as a basic work can provide important reference for other performance optimization researches. In this paper, based on previous researches of server performance analysis, we propose a node performance measurement method on Hadoop. We describe in detail how to measure the performance value of each node in heterogeneous Hadoop cluster and evaluate measurement results by running MapReduce programs. Meanwhile, the method has also been implemented and evaluated in realworld Hadoop cluster. Experiment results show that the method can accurately measure the performance value of each node. Based on this research, users can have a comprehensive and objective understanding of their own Hadoop cluster and then make optimization and improvement on Hadoop.
منابع مشابه
Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming
The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملA Design of Heterogeneous Cloud Infrastructure for Big Data and Cloud Computing Services
Cloud computing and big data are two core services in many organizations. Combining a big data platform, such as Hadoop, into the cloud architecture using virtualization technique will result in losing the performance benefit of MapReduce. Unique for the existing virtualized big data cloud, this work introduces an innovative cloud architecture called the heterogeneous cloud. In the heterogeneou...
متن کاملTowards Understanding Cloud Performance Tradeoffs Using Statistical Workload Analysis and Replay
Cloud computing has given rise to a variety of distributed applications that rely on the ability to harness commodity resources for large scale computations. The inherent performance variability in these applications’ workload coupled with the system’s heterogeneity render ineffective heuristics-based design decisions such as system configuration, application partitioning and placement, and job...
متن کاملAn efficient Mapreduce scheduling algorithm in hadoop
Hadoop is a free java based programming framework that supports the processing of large datasets in a distributed computing environment. Mapreduce technique is being used in hadoop for processing and generating large datasets with a parallel distributed algorithm on a cluster. A key benefit of mapreduce is that it automatically handles failures and hides the complexity of fault tolerance from t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JNW
دوره 8 شماره
صفحات -
تاریخ انتشار 2013